Goto

Collaborating Authors

 conversation model


Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes

Galetzka, Fabian, Beyer, Anne, Schlangen, David

arXiv.org Artificial Intelligence

Recent conditional language models are able to continue any kind of text source in an often seemingly fluent way. This fact encouraged research in the area of open-domain conversational systems that are based on powerful language models and aim to imitate an interlocutor by generating appropriate contributions to a written dialogue. From a linguistic perspective, however, the complexity of contributing to a conversation is high. In this survey, we interpret Grice's maxims of cooperative conversation from the perspective of this specific research area and systematize the literature under the aspect of what makes a contribution appropriate: A neural conversation model has to be fluent, informative, consistent, coherent, and follow social norms. In order to ensure these qualities, recent approaches try to tame the underlying language models at various intervention points, such as data, training regime or decoding. Sorted by these categories and intervention points, we discuss promising attempts and suggest novel ways for future research.


An Empirical Study of Multitask Learning to Improve Open Domain Dialogue Systems

Farahani, Mehrdad, Johansson, Richard

arXiv.org Artificial Intelligence

Autoregressive models used to generate responses in open-domain dialogue systems often struggle to take long-term context into account and to maintain consistency over a dialogue. Previous research in open-domain dialogue generation has shown that the use of \emph{auxiliary tasks} can introduce inductive biases that encourage the model to improve these qualities. However, most previous research has focused on encoder-only or encoder/decoder models, while the use of auxiliary tasks in \emph{decoder-only} autoregressive models is under-explored. This paper describes an investigation where four different auxiliary tasks are added to small and medium-sized GPT-2 models fine-tuned on the PersonaChat and DailyDialog datasets. The results show that the introduction of the new auxiliary tasks leads to small but consistent improvement in evaluations of the investigated models.


THINK: A Novel Conversation Model for Generating Grammatically Correct and Coherent Responses

Sun, Bin, Feng, Shaoxiong, Li, Yiwei, Liu, Jiamou, Li, Kan

arXiv.org Artificial Intelligence

Many existing conversation models that are based on the encoder-decoder framework have focused on ways to make the encoder more complicated to enrich the context vectors so as to increase the diversity and informativeness of generated responses. However, these approaches face two problems. First, the decoder is too simple to effectively utilize the previously generated information and tends to generate duplicated and self-contradicting responses. Second, the complex encoder tends to generate diverse but incoherent responses because the complex context vectors may deviate from the original semantics of context. In this work, we proposed a conversation model named "THINK" (Teamwork generation Hover around Impressive Noticeable Keywords) to make the decoder more complicated and avoid generating duplicated and self-contradicting responses. The model simplifies the context vectors and increases the coherence of generated responses in a reasonable way. For this model, we propose Teamwork generation framework and Semantics Extractor. Compared with other baselines, both automatic and human evaluation showed the advantages of our model.


Plug-and-Play Conversational Models

Madotto, Andrea, Ishii, Etsuko, Lin, Zhaojiang, Dathathri, Sumanth, Fung, Pascale

arXiv.org Artificial Intelligence

There has been considerable progress made towards conversational models that generate coherent and fluent responses; however, this often involves training large language models on large dialogue datasets, such as Reddit. These large conversational models provide little control over the generated responses, and this control is further limited in the absence of annotated conversational datasets for attribute specific generation that can be used for fine-tuning the model. In this paper, we first propose and evaluate plug-and-play methods for controllable response generation, which does not require dialogue specific datasets and does not rely on fine-tuning a large model. While effective, the decoding procedure induces considerable computational overhead, rendering the conversational model unsuitable for interactive usage. To overcome this, we introduce an approach that does not require further computation at decoding time, while also does not require any fine-tuning of a large language model. We demonstrate, through extensive automatic and human evaluation, a high degree of control over the generated conversational responses with regard to multiple desired attributes, while being fluent.


DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs

Tuan, Yi-Lin, Chen, Yun-Nung, Lee, Hung-yi

arXiv.org Artificial Intelligence

Data-driven, knowledge-grounded neural conversation models are capable of generating more informative responses. However, these models have not yet demonstrated that they can zero-shot adapt to updated, unseen knowledge graphs. This paper proposes a new task about how to apply dynamic knowledge graphs in neural conversation model and presents a novel TV series conversation corpus (DyKgChat) for the task. Also, we propose a preliminary model that selects an output from two networks at each time step: a sequence-to-sequence model (Seq2Seq) and a multi-hop reasoning model, in order to support dynamic knowledge graphs. To benchmark this new task and evaluate the capability of adaptation, we introduce several evaluation metrics and the experiments show that our proposed approach outperforms previous knowledge-grounded conversation models. The proposed corpus and model can motivate the future research directions 1 . 1 Introduction In the chitchat dialogue generation, neural conversation models (Sutskever et al., 2014; Sordoni et al., 2015; Vinyals and Le, 2015) have emerged for its capability to be fully data-driven and end-to-end trained. While the generated responses are often reasonable but general (without useful information), recent work proposed knowledge-grounded models (Eric et al., 2017; Ghazvinine-jad et al., 2018; Zhou et al., 2018b; Qian et al., 2018) to incorporate external facts in an end-to- end fashion without handcrafted slot filling. Effectively combining text and external knowledge1 The data and code are available in https://github. Nonetheless, prior work rarely analyzed the model capability of zero-shot adaptation to dynamic knowledge graphs, where the states/entities and their relations are temporal and evolve as a single time scale process.


Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading

Qin, Lianhui, Galley, Michel, Brockett, Chris, Liu, Xiaodong, Gao, Xiang, Dolan, Bill, Choi, Yejin, Gao, Jianfeng

arXiv.org Artificial Intelligence

Although neural conversation models are effective in learning how to produce fluent responses, their primary challenge lies in knowing what to say to make the conversation contentful and non-vacuous. We present a new end-to-end approach to contentful neural conversation that jointly models response generation and on-demand machine reading. The key idea is to provide the conversation model with relevant long-form text on the fly as a source of external knowledge. The model performs QA-style reading comprehension on this text in response to each conversational turn, thereby allowing for more focused integration of external knowledge than has been possible in prior approaches. To support further research on knowledge-grounded conversation, we introduce a new large-scale conversation dataset grounded in external web pages (2.8M turns, 7.4M sentences of grounding). Both human evaluation and automated metrics show that our approach results in more contentful responses compared to a variety of previous methods, improving both the informativeness and diversity of generated output.


Conversation Model Fine-Tuning for Classifying Client Utterances in Counseling Dialogues

Park, Sungjoon, Kim, Donghyun, Oh, Alice

arXiv.org Machine Learning

The recent surge of text-based online counseling applications enables us to collect and analyze interactions between counselors and clients. A dataset of those interactions can be used to learn to automatically classify the client utterances into categories that help counselors in diagnosing client status and predicting counseling outcome. With proper anonymization, we collect counselor-client dialogues, define meaningful categories of client utterances with professional counselors, and develop a novel neural network model for classifying the client utterances. The central idea of our model, ConvMFiT, is a pre-trained conversation model which consists of a general language model built from an out-of-domain corpus and two role-specific language models built from unlabeled in-domain dialogues. The classification result shows that ConvMFiT outperforms state-of-the-art comparison models. Further, the attention weights in the learned model confirm that the model finds expected linguistic patterns for each category.


Jointly Optimizing Diversity and Relevance in Neural Response Generation

Gao, Xiang, Lee, Sungjin, Zhang, Yizhe, Brockett, Chris, Galley, Michel, Gao, Jianfeng, Dolan, Bill

arXiv.org Artificial Intelligence

Although recent neural conversation models have shown great potential, they often generate bland and generic responses. While various approaches have been explored to diversify the output of the conversation model, the improvement often comes at the cost of decreased relevance. In this paper, we propose a method to jointly optimize diversity and relevance that essentially fuses the latent space of a sequence-to-sequence model and that of an autoencoder model by leveraging novel regularization terms. As a result, our approach induces a latent space in which the distance and direction from the predicted response vector roughly match the relevance and diversity, respectively. This property also lends itself well to an intuitive visualization of the latent space. Both automatic and human evaluation results demonstrate that the proposed approach brings significant improvement compared to strong baselines in both diversity and relevance.


DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder

Gu, Xiaodong, Cho, Kyunghyun, Ha, Jung-Woo, Kim, Sunghun

arXiv.org Artificial Intelligence

Variational autoencoders (VAEs) have shown a promise in data-driven conversation modeling. However, most VAE conversation models match the approximate posterior distribution over the latent variables to a simple prior such as standard normal distribution, thereby restricting the generated responses to a relatively simple (e.g., unimodal) scope. In this paper, we propose DialogWAE, a conditional Wasserstein autoencoder (WAE) specially designed for dialogue modeling. Unlike VAEs that impose a simple distribution over the latent variables, DialogWAE models the distribution of data by training a GAN within the latent variable space. We further develop a Gaussian mixture prior network to enrich the latent space. Experiments on two popular datasets show that DialogWAE outperforms the state-of-the-art approaches in generating more coherent, informative and diverse responses. Neural response generation has been a long interest of natural language research. Most of the recent approaches to data-driven conversation modeling primarily build upon sequence-to-sequence learning (Cho et al., 2014; Sutskever et al., 2014). Previous research has demonstrated that sequenceto-sequence conversation models often suffer from the safe response problem and fail to generate meaningful, diverse on-topic responses (Li et al., 2015; Sato et al., 2017).


Shaping the Narrative Arc: An Information-Theoretic Approach to Collaborative Dialogue

Mathewson, Kory W., Castro, Pablo Samuel, Cherry, Colin, Foster, George, Bellemare, Marc G.

arXiv.org Artificial Intelligence

We consider the problem of designing an artificial agent capable of interacting with humans in collaborative dialogue to produce creative, engaging narratives. In this task, the goal is to establish universe details, and to collaborate on an interesting story in that universe, through a series of natural dialogue exchanges. Our model can augment any probabilistic conversational agent by allowing it to reason about universe information established and what potential next utterances might reveal. Ideally, with each utterance, agents would reveal just enough information to add specificity and reduce ambiguity without limiting the conversation. We empirically show that our model allows control over the rate at which the agent reveals information and that doing so significantly improves accuracy in predicting the next line of dialogues from movies. We close with a case-study with four professional theatre performers, who preferred interactions with our model-augmented agent over an unaugmented agent.